Oct. 31, 2018

Acknowledgements/References

Intro to Exploratory Data Analysis (EDA)

EDA is a creative process with the purpose of exploring your data to generate quality questions. It is not quantitative but more qualitative in nature and is iterative as the exploration inspires questions which become more exploration and so on.

“There are no routine statistical questions, only questionable statistical routines.” — Sir David Cox

A good starting place for questions:

  • What type of variation occurs within my variables?
  • What type of covariation occurs between my variables?

To answer these question its crucial to know if our data is Categorical or Numerical.

The Motivation behind Visual EDA

## # A tibble: 4 x 6
##   set   `mean(x)` `sd(x)` `mean(y)` `sd(y)` `cor(x, y)`
##   <fct>     <dbl>   <dbl>     <dbl>   <dbl>       <dbl>
## 1 I             9    3.32      7.50    2.03       0.816
## 2 II            9    3.32      7.50    2.03       0.816
## 3 III           9    3.32      7.5     2.03       0.816
## 4 IV            9    3.32      7.50    2.03       0.817

Before Getting Our Hands Dirty

Exploring One Numerical Variable

One Numerical One Categorical

One Numerical One Categorical (Cont.)

Using geom_jitter() and geom_quasirandom() with the data set filtered to the region New York.

Two Numerical Variables

Exploring More than Two Variables

So far we primarily have been using position mapping and/or color mapping. But we are able to map more variables to the following aesthetics:

Aesthetic Description
x X axis position
y Y axis position
color Color of dots, outlines of other shapes
fill Fill color
size Diameter of points, thickness of lines
alpha Transparency
linetype Line dash pattern
labels Text on a plot or axes
shape Shape

Continous Variable Visual Effectiveness